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© A process for combining the results of several classifiers. 



i A process for successfully combining the re- 
sults of several classifiers provides a method for 
calculating confidences of each classification 
decision for every dassffier involved. Confi- 
dences are then combined according to the 
Dempster-Shafer Theory of Evidence, Initially, 
basic probability assignments for each of the 
classifiers are calculated and used to calculate 
confidences for' each classifier. The confi- 
dences for aU of the classifiers are then com- 
bined. The combined confidences are then 
used to determine a class for the data input to 
the classifiers. 
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Technical Field 

Th present Invention is directed to the field of pattern classification and recognition and, more particu- 
larly, to a mechanism for calculating a confidence of any classification decision. 

5 

Background Art 

Pattern recognition problems, such as classification of machine or hand-printed character, are currently 
solved with some accuracy by using traditional classifiers or neural networks of different types. It is easier in 

10 many cases to apply several classifiers to the same recognition task to improve recognition performance in a 
combined system, Instead of Inventing a new architecture or a feature extractor to achieve the same accuracy. 
However, It is necessary to assign a measure of evidence to a classification decision of each classifier In a 
system, to be able to combine them. Unfortunately, such assignments demand numerous approximations, es- 
pecially when the number of classes is large. This creates computational difficulties, and can decrease the 

13 quality of the recognition performance. 

It is known in the art to use the Dempster-Sharer Theory of Evidence as a tool for representing and com- 
bining measures of evidence. In the existing art, U.S. Patent No. 5,123,057 uses the Dempster-Shafer Theory 
of Evidence to calculate a degree of match between data events portions and model parts. Evidence is col- 
lected and processed after preliminary matching has been performed using the Dempster-Shafer Theory of 

20 Evidence. 

Additionally, U.S. Patent No. 5,077,807 to Bokser describes a method for processing input feature vectors. 
Again, the '807 patent relates to a preprocessing means for pattern recognition. So although the prior art ad- 
dresses the problem of pattern classification and recognition, the existing art does not address the use of the 
Dempster-Shafer Theory of Evidence as a post recognition or postprocessing tool to combine results of several 
29 classifiers. 

It is seen then that it would be desirable to have a means for improving the results of classification. 
Summary of the Invention 

do The present invention improves the results of classification by successfully combining the results of sev- 
eral classifiers. The invention also improves the results of classification by eliminating numerous approxima- 
tions which result in a decreased accuracy and quality of the classification and recognition performance. 

Specifically, the present invention uses e distance measure between a classifier output vector and a mean 
vector for a subset of training data corresponding to each class. Using these distances for basic probability 

as assignments in the framework of the Dempster-Shafer Theory of Evidence, evidences for all classification de- 
cisions for each classifier can be calculated and combined. 

In accordance with one aspect of the present Invention, a method for combining the results of several clas- 
sifiers comprises e series of steps. Basic probability assignments are calculated for each of the classifiers 
and used to calculate confidences for each classifier. The confidences for ail of the classifiers are then com- 

40 blned. Finally, the combined confidences are used to determine a class for data Input to the classifiers. 

Accordingly, it is an object of the present invention to provide a mechanism for calculating a confidence 
of any classification decision, which can increase the quality of recognition performance. Other objects and 
advantages of the invention will be apparent from the following description, the accompanying drawings and 
the appended claims. 

45 

Brief Description of the Drawings 

Fig. 1 1s block diagram illustrating the combination of several classifiers; and 

Fig. 2 is a flowchart illustrating the steps employed to achieve the combination of the several classifiers, 
50 as depicted in Fig. 1. 

Detailed Description of the Preferred Embodiments 

The present invention relates to a mechanism for combining the results of several classifiers. Referring 
55 to Fig. 1, a block diagram 10 illustrates the combination of several classifiers. In Fig. 1, three classifiers 12, 
14, and 16 are shown for descriptive purposes. However, the concept of the present invention will be applicable 
to a system having any number of classifiers, from one to N. 

As can be seen in Fig. 1, data from data block 18 is input to each classifier 12, 14, and 16. The present 
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Invention allows for the determination of the dass of data from data block 18 ^eing input to the classifiers 12, 
14, and 16, based on the output of the classifiers to a voter block 20. 

Referring now to Fig. 2 and continuing with Fig. 1 , Inside each classifier block 12, 14, and 16, a basic prob- 
ability assignment is calculated, as indicated by Mock 22 of Fig. 2. The basic probe bflrty assignments are used 

5 to calculate confidences for each classifier, as indicated at block 24. The confidences output from each clas- 
sifier block are combined at the voter block 20, as indicated at block 26. The voter block 20 uses the outputs 
of the dassif iers as its input, can calculate confidence, i.e., evidence, for the output of each classifier, and 
combines these confidences. The confidence or evidence for each dassif ier is a measure of the correctness 
of the answer that the dassif ier produces. As Indicated at step 28 of Fig. 2, a classification is then made, iden- 

10 tifying the Input from the data block 18 for the classifier which is to be recognized, based on the combined 
confidences from the voter block 20. The output of the voter block 20. then, is the result of the classification 
process of the present invention. The present invention provides improved results by taking Into account the 
different decisions of different dassif iers. 

In a preferred embodiment, the basic probability assignments are calculated using a distance measure in 

is accordance with the Dempster-Shafer Theory of Evidence. The confidences and the combinations are also 
calculated according to the Dempster-Shafer Theory of Evidence. Using the distance measures for basic prob- 
ability assignments In the framework of the Dempster-Shafer Theory of Evidence, evidences for all dassif I- 
catton decisions for each dassif ier can be calculated and combined. 
In applying the teachings of the present invention, assume 

20 x* 

to be a subset of the training data corresponding to a dass k. In addition, assume 

to be a mean vector for a set 
29 for each dassif ier f 1 and each dass k. Then 
is a reference vector fa each dass K and 
is a distance measure between 
and 

?». 

This distance measure can be used to calculate the basic probability assignments of block 22 in Ffg. 2. in ac- 
cordance with the Dempster-Shafer Theory of Evidence. 

According to the Dempster-Shafer Theory of Evidence, consider a frame of discernment 

© ° (©I.-mBJ, 



30 



40 is the hypothesis that "a vector 

r 

is of the dass k\ For any dassif ier f and each class k. a distance measure df can represent evidence in sup- 
port of hypothesis 

** 

43 if I = k, and In support of 

-*» 

or against 

e* 

if i is not equal to k. 

56 With 8 as the frame of discernment, 2 e denotes the set of all subsets of 6. A function m is called a basic 
probability assignment If m^ 6 

m(0)«O 

10,1], and 
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wher m(A) represents the exact belief in the hypothesis A. Therefore, if m is a basic probability assignment, 
then a function Befc^-Hd) satisfying 

BellB) - E/n(A) 



is caOed a belief function. 

There is a one-to-one correspondence between the belief function and the basic probability assignment 
10 If mt and mj are basic probability assignments on e, their combination or orthogonal sum, 

is defined as 



f5 /nU^C" 1 E m^ m^Dj, 



C= £ m.(B) m (D) , 
oC\b*a 



where 

28 m(e) = 0 

and 

■ • t< A*e. 

Since there is the one-to-one correspondence between Bel and m, the orthogonal sum of belief functions 

Bei = Beli&Boli 

90 is defined in the obvious way. 

Special kinds of belief functions are very good at representing evidence. These functions are ceiled simple 
and separable support functions. Bel is a simple support function if there exists an 

called the focus of Bel, such that Be4(6) » 1 and Bel(A) = s, if both conditions, 
36 FgA 
and 

. 4#e, 

where s is called Bel's degree of support 

Otherwise, Bei(A) = 0. A separable support function is either a simple support function or an orthogonal sum 
40 of simple support functions. Separable support functions are very useful when it is desired to combine evi- 
dences from several sources, if Bel is a simple support function with focus 

then m(F) ■ a, m(e) ■ 1-e, and m is 0 elsewhere. 

Let F be a focus for two simple support functions with degrees of support and 82, respectively. If 
45 Bei = 80/^80/2 

then m(F) - 1 - (l-s^l-sj, m(©) ■ (l-sOO-Sa). and m is zero elsewhere. 

Knowing these properties of the simple belief function, then df can be used as a degree of support for 
the Bel with focus 

e* 

so If I a k. Also, di" are degrees of support for Bel with focus 
if i is not equal to k. This yields the probability assignments 

55 ^d^m^i-nn-d^. 



Combining all of the knowledge about focus, the evidence can be obtained for class k and classifier n as: 
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Expanding on this equation yields: 



_ dflld-d/) 
e k (y»)= i0k 



i-c(f(i-n d-d/)) 

t0 Finally, evidences for all dasslflers may be combined according to the Dempster-Shafer rule to obtain a meas- 
ure of confidence for each dass k for the Input vector x, 

?• 

1*1-1. 



19 Since 



In this case, then 



e k (x) =Ue k (y>) 



Now, a dass mean be assigned to an input vector, 

25 |f 



e «maxe > . 

Kk*K 

30 

In accordance with the present invention, there are two almost equivalent best functions for a distance 



5 

and 

35 

y. 

One of these distance measures is a cosine between a classifier output vector and a mean vector for a 
subset of training data corresponding to each dass, orcos^o*"). where a** Is an angle between 

4, and 

r. 

or. 

<tf =4>x <e*. y") - _ _ . 

Uffl'lySl* 



The other distance measure is a particular function of Euclidean distance between a classifier output vector 
and a mean vector for a subset of training data corresponding to each class, or of Euclidian distance between 

and 

P. 

and: 
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E (l+l^-yl 2 )- 1 
liisr 

6 

The present invention provides a simple and effective method for calculation of basic probability assign- 
ments, which permits confidences for individual classification decisions to be obtained. The process and meth- 
od of the present invention can be used for the estimation of confidence for several classifiers and the corn- 
to bination of the results of several classifiers. 

Industrial Applicability and Advantages 

The present Invention is useful In the field of pattern classification and recognition and has the advantage 
13 of using the results of different classifiers to improve classification and recognition performance. The present 
invention has the further advantage of eliminating approximations, which can be particularly prohibitive in cas- 
es where the number of classes Is large. This, In turn, increases the quality of the classlf (cation and recognition 
performance. Simple end effective calculation of basic probability assignments permits confidences for indi- 
vidual classification decisions to be obtained. The present invention has the further advantage of allowing for 
20 the estimation of confidence for several classifiers and the combination of the results of several classifiers. 

Finally, the present invention has the advantage of being applicable to any traditional statistical classifiers 
or neural networks of different architectures and based on different sets of features, as weO as to different 
applications in which calculations of confidence for each classification decision are necessary. 

Having described the Invention In detail and by reference to the preferred embodiment thereof, it will be 
25 apparent that other modifications and variations are possible without departing from the scope of the invention 

defined in the appended dalms. 
The invention is summarized as follows : 

1 . A method for combining results of several classifiers comprises the steps of: 

calculating basic probability assignments for each of the classifiers; 
so using the basic probability assignments to calculate confidences for each classifier; 

combining the confidences for all of the classifiers; and 

using the combined confidences to determine a class for data Input to the classifiers. 

2. A method according to 1 wherein the steps of calculating basic probability assignments and using the 
basic probability assignments to calculate confidences for each classifier comprise the step of applying 

35 a suitable theory of evidence. 

3. A method according to 2 wherein the suitable theory of evidence comprises the Dempster-Shafer Theory 
of Evidence. ~ ~ 

4. A method according to 1 wherein the step of using the basic probability assignments to calculate con- 
fidences for each classifier comprises the steps of. 

40 using a distance measure between a classifier output vector and a mean vector for a subset of train- 

ing data corresponding to each class; and 

calculating evidences for all classification decisions for each classifier, using the distances as basic 
probability assignments. 

5. A method according to 4 wherein the distance measure comprises one of two almost equivalent distance 
45 measures, 

6. A method according to 5 wherein the first distance measure comprises a cosine between a classifier 
output vector and a mean vector for a subset of training data corresponding to each dass. 

7. A method according to 5 wherein the second distance measure comprises a function of Euclidean dis- 
tance between a classifier output vector and a mean vector for a subset of training data corresponding to 

60 each dass. 

8. A method according to 1 wherein the step of combining the confidences comprises the step of combining 
the confidences according to the Dempster-Shafer Theory of Evidence. 

9. A system for combining the results of several dassif iers comprising: 

means for calculating basic probability assignments for each of the dassif lers; 
55 means for calculating confidences for each dassif ler from the basic probability assignments; 

means for combining the confidences for all of the dassif iers; and 

means for determining a dass for data Input to the dassif iers from the combined confidences. 

10. A system according to 9 wherein the means for calculating basic probability assignments and conf I- 
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dences for each classifier comprise a suitable theory of evidence. 

11. A system according to 10 wherein the suitable theory of evidence comprises the Dempster-Shafer 
Theory of Evidence. 

12. A system according to 9 wherein the means for calculating confidences for each classifier comprises: 
6 a distance measure between a classifier output vector and a mean vector for a subset of training 

data corresponding to each class; and 

means for calculating evidences for ail classification decisions for each classifier, using the distanc- 
es as basic probability assignments. 

13. A system according to 12 wherein the distance measure comprises one of two almost equivalent dis- 
10 tance measures. 

14. A system according to 13 wherein the first distance measure comprises a cosine between a classifier 
output vector and a mean vector for a subset of training data corresponding to each class. 

15. A system according to 13 wherein the second distance measure comprises a function of £udidean dis- 
tance between a classifier output vector and a mean vector for a subset of training data corresponding to 

15 each class. 



Claims 

20 1 . A method for combining results of several classifiers comprises the steps of: 
calculating basic probability assignments for each of the classifiers; 
using the basic probability assignments to calculate confidences for each dassif let; 
combining the confidences for aJI of the classifiers; and 

using the combined confidences to determine a dass for data input to the dassif ters. 

25 

2. A method as daimed in daim 1 wherein the steps of calculating basic probability assignments and using 
the basic probability assignments to calculate confidences for each classifier comprise the step of ap- 
plying a suitable theory of evidence. 

30 3. A method es daimed in daim 1 wherein the step of using the basic probability assignments to calculate 
confidences for each dassif ier comprises the steps of: 

using a distance measure between a dassif ier output vector and a mean vector for a subset of train- 
ing data corresponding to each dass; and 

calculating evidences tor all classification decisions for each dassif ier, using the distances as basic 
35 probability assignments. 

4. A method as daimed in daim 3 wherein .the distance measure comprises one of two almost equivalent 
distance measures. 

5. A method as daimed In claim 4 wherein the first distance measure comprises a cosine between a das- 
40 slf ier output vector and a mean vector for a subset of training data corresponding to each dass. 

8. A system for combining the results of several dassif ters comprising: 

means for calculating basic probability assignments for each of the dassif iers; 
means for calculating confidences for each dassif ier from the basic probability assignments; 
45 means for combining the confidences for ad of the dassif iers; and 

means for determining a dass for data input to the dassif iers from the combined confidences. 

7. A system as daimed in daim 6 wherein the means for calculating basic probability assignments and con- 
fidences for each classifier comprise a suitable theory of evidence. 

60 

8. A system as daimed in daim 7 wherein the means for calculating confidences for each dassif ier com- 
prises: 

a distance measure between a dassif ier output vector and a mean vector for a subset of training 
data corresponding to each dass; and 
55 means for calculating evidences for aO classification decisions for each dassif ier, using the dis- 

tances as basic probability assignments. 

9. A system as daimed in daim 8 wherein the distance measure comprises one of two almost equivalent 
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distance measures. 

10. A system as claimed in claim 9 wherein the second distance measure comprises a function of Euclidean 
distance between a classifier output vector and a mean vector for a subset of training data corresponding 
to each class. 
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